Keyword [Multi-Scale DenseNet]
Huang G, Chen D, Li T, et al. Multi-scale dense convolutional networks for efficient prediction[J]. arXiv preprint arXiv:1703.09844, 2017, 2.
1. Overview
1.1. Motivation
- Small model can deal with easy example, but make mistake for hard
- Hard example need to be deal with by large model, but waste for easy
- last layer’s features for classification, early layer’s are not
- first layer for fine-scale, later layer for coarse-scale
In this paper, Multi-Scale DenseNet (MSDNet) is proposed which automatically
- small for easy
- large for hard
contains two setting - anytime classification
- budgeted batch classification (probability threshhold)
has two feature
- multi-scale feature map + multi-classifier
- dense connectivity
1.2. Contribution
- First deep learning architecture of its kind that allows dynamic resource adaptation with a single model
- First discover that dense connectivity is crucial to early-exit classifier
1.3. Setting
1.3.1. Anytime prediction
- stop at any time point (budget exhausted)
- return most recent predication
- nondeterministic budget, varies per test instance
- L. suitable loss function
- B. budget
- f. model
- x. input image
1.3.2. Budget batch classification
- stop when sufficient confidence
- Less than B/M computation for easy example
- More than B/M computation for hard example
1.4. Related Work
- Computation-efficient
- prune weights
- quantize weights
- compact model
- knowledge-distillation
- Resource-efficient
- FractalNet
- Adaptive computation time approach
1.5. Visualization
1.6. Future Work
- Extend to other task. segmentation
- Combine MSDNet with model compression, spatially adaptive computation, more efficient convolution operation
2. Multi-Scale DenseNet
2.1. Problem
Lack of coarse-level feature
- Accuracy of the classifier is correlated with its position within the network
- Solution. multi-scale feature map
Early classifier interfere with later classifier
- Solution. Dense connectivity
2.2. Model
2.3. Lazy Evaluation
Finer feature map do not influence the prediction of classifier.
2.4. Loss Function
Empirically, $w_k$ = 1.